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A Text Recognition Procedure for Cryptanalysis 

BY C. V. KIMBALL 
Unclassified 

Presented before the International Symposium in Information Theory 
at UCLA, 31 January-2 February 1966, this paper reveals the character 
and quality of work in cryptanalysis now in progress in universities. 

It uses standard Bayesian methods equivalent to the common log scoring 
techinques with which our analysts are familiar. The conditions for 
acceptance or rejection of the nidi hypothesis are somewhat more explicitly 
stated, perhaps, but no new concepts are involved. The solution given is 
for the easy part of the problem, with all probabilities assumed known, and 
even a priori odds. The really challenging task is to deduce useful 
weights, not from known or assumed probabilities, but from samples of 
the cipher or code. The problem treated is clearly and carefully stated, 
and the solution is thorough and competent. 

Dr. B. C. Getchell, Pi 

INTRODUCTION 

This paper is a brief summary of a long-term study reported in detail 
in [1]. As a result, much of the underlying material is given only 
cursory treatment. The first section introduces the decision proce- 
dure used in the study and explains the importance of source redun- 
dancy in making rapid decisions. In the second section, the decision 
procedure is applied to the recognition of natural-language texts 
among random texts. In the third section, the recognition procedure 
is applied to a general cryptographic problem. 

DEFERRED-DECISION PROCEDURE 

In the binary detection problem considered here, one of two sources, 
SN (signal plus noise) or N (noise alone), provides a stream of M 
symbols to the decision device. 1 The decision device is to respond A 
if SN is present, or B if N is present. Decisions are made on the basis 
of sequential observations of successive symbols from the source. The 
two sources generate independent symbols taken from the same finite 
alphabet, (Z h Z 2 , . . . , Z K ), according to known probability distribu- 

1 Much of the current work in deferred-decision theory involves the detection 
of signals in noise, and the symbols above are common. 
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Hons which are conditional to the source. We let a * be the probability 
of Zi when SN is present and let /? t be the probability of Z, when N is 
present. In addition, we will assume that the a priori probabilities of 
SN and N are known. 

For this problem, the deferred-decision procedure leads to minimum 
expected losses according to a predetermined loss rule. This loss rule 
assigns a cost W M to the response B when the SN source is present, 
and a cost W F to the response A when N is present. Also, a fixed cost, 
C, is charged for each observation. When the parameters M, [a,-|, 
W F , W M , and C are specified, the decision procedure can be found 
by using a computer-implemented optimization process. 

The log-odds-ratio transformation is helpful in describing the opera- 
tion of the decision device. Let m designate the number of observa- 
tions that have been taken; then the log-odds-ratio after m observa- 
tions is given by 

t / P(SN\m observations) m 
m ~ ln 1 - P(SN\m observations) * w 
When the state of the decision process is expressed in terms of L m , 
Bayes* rule can be written in a form particularly convenient for 
analysis arid implementation: 

L m+1 = L m +X(Z<). (2) 

HereL m+ i is the log odds ratio after the symobl Z< is observed, and 
X t is the log likelihood ratio of Z i9 given by 

X (Zi) = InajVi. (3) 
The decision process is usually considered in terms of L m . The 
decision function is represented by Af + 1 pairs, (A m , r m ), of decision 
points. If L m > A m , the decision is A; if L m < T my the decision is B. If 
A m >L ni > r m , another observation is taken and the process continues. 
Fig. 1 is helpful in visualizing the operation of the device. 

As background, let us consider an important theorem for the case 
in which the symbols from the N source have a uniform distribution 
— that is, in which 0, = 1/ K for all i. For sequential procedures, as 
is well known, ([2], [3]), the speed with which decisions are made de- 
pends on Afx, the difference in mean values of the log likelihood ratio 
in SN and N. 

Aju =E(\ \SN) - E(\ [AO- (4) 

The following theorem relates Ajx to the Shannon-redundancy of the 

SN source. 

Theorem 

A M > ln2 |^log 2 K + £ « t log 2 (5) 
> ln2 R(SN) . (6) 
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Thus one can often compare the detectability of two sources when 
only their respective redundancies are known. 

THE RECOGNITION PROCEDURE 

The problem considered here is that of recognizing natural-language 
among random texts in which symbols are uniformly distributed. 
Natural-language text is defined as a meaningful sequence of symbols 
from a written language — that is, letters spelling words which, to- 
gether, convey a meaning. A deferred-decision procedure is used to 
recognize the natural language rapidly and achieve optimum loss 
performance. 

The deferred-decision procedure can be used only when distributions 
of the symbols for random and natural-language texts differ. The 
theorem in Section 1 suggests that the redundancy of natural-language 
text is an excellent measure of the differences between the two distri- 
butions. Since the single-symbol redundancy of most natural lan- 
guages is of the order of 15 percent, the deferred-decision procedure 
provides an effective recognition technique. For purposes of analysis, 
we have considered the problem of recognition of English text; of 
course, the procedure can be applied to other written languages. 

The deferred-decision procedure requires that the loss parameters 
W P9 W m , and C, be specified; they, in turn, depend on the problem at 
hand. Here we will use the loss conditions that arise in the applica- 
tion of the procedure to cryptography, described in the next section. 
In this application the loss structure has the following characteristics. 

(1) The cost of a miss, W m , is much greater than that of a false 
alarm, W F . 

(2) The cost of observation for a single symbol is much smaller than 
that of either kind of error loss. 

The present analysis is based on the loss ratio W m /W F = 500.0 and 
W F /C = 10,000, and yields representative results for this loss struc- 
ture. 

The deferred-decision procedure was analyzed for the above loss 
conditions for the case in which 36 observations were available to the 
decision device; that is, M = 36. The probabilities a { were taken as 
the single-letter probabilities for English and are shown with their 
corresponding X's in Table 1. Figure 2 depicts the decision functions 
obtained and the motion of the log-odds ratio for two texts. The 
first text is from a well known news magazine, the second has been 
derived with the aid of a table of random units. Decision procedure 
uses a priori probability of English of 0.000002. The theoretical per- 
formance of the procedure under these conditions is given as a func- 
tion of the probability of SN in Table 2, where v SN and v N designate 
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the expected number of observations in SN and N, respectively. 
From these results we conclude that English text can be recognized 
very rapidly by the deferred-decision procedure. 



Symbol, Zi 


a i 




\{Zi) 


A 


0.815 


0.0385 


+0.7510 


B 


0.0144 


0.0385 


-0.9817 


C 


0.0276 


0.0385 


-0.3322 


D 


0.0379 


0.0385 


-0.0149 


E 


0.1311 


0.0385 


+ 1.2258 


F 


0.0292 


0.0385 


-0.2737 


G 


0.0119 


0.0385 


-0.6564 


H 


0.0526 


0.0385 


+0.3131 


I 


0.0635 


0.0385 


+0.5008 


J 


0,0013 


0.0385 


-3.3720 


K 


0.0042 


0.0385 


-2.2145 


L 


0.0339 


0.0385 


-0.1262 


M 


0.0254 


0.0385 


— 0.4161 


N 


0.0710 


0.0385 


+0.6129 


0 


0.0800 


0.0385 


+0.7316 


P 


0.0198 


0.0385 


-0.6624 


Q 


0.0012 


0.0385 


-3.4590 


R 


0.0683 


0.0385 


+0.5747 


S 


0.0610 


0.0385 


+0.4616 


T 


0.1047 


0.0385 


+1.0011 


U 


0.0246 


0.0385 


-0.4469 


V 


0.0092 


0.0385 


-1.4315 


w 


0.0154 


0.0385 


-0.9153 


X 


0.0017 


0.0385 


-3.1428 


Y 


0.0198 


0.0385 


-0.6624 


Z 


0.0008 


0.0385 


-3.9110 



Table 1. — Symbol probabilities for English and random texts. 
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A Priori 


Probability 


Probability 






Probability 


of Correct 


of 


V SN 


V N 


OfSN 


Detection 


False Alarm 






U.UUUUUz 


U. /04 


<U.UUU1 




Q Q 
O.O 


0.00001 


0.873 


0.0002 


33. 


6.1 


0.0001 


0.983 


0.0006 


33. 


10.0 


0.001 


0.996 


0.0021 


31. 


13.0 


0.01 


0.999 


0.0065 


27. 


17.0 


0.1 


0.999 


0.0174 


24. 


21.0 



Table 2. — Theoretical performance of the recognition procedure. 



W F jC = 10.000 W M /W F = 500 M = 36 

APPLICATION TO CRYPT ANALYSIS 

The recognition procedure described in the preceding section was 
developed for use in cryptanalysis, the extraction of meaningful text 
from an enciphered text without a key. Although a particular cipher 
is used as an example, the method presented here is applicable to 
any cipher that preserves the redundancy of the concealed text. 

A cipher is a set of invertible transformations {fj\ of a natural 
language text t into an enciphered text x, 

f&) = x. (7) 

The subscript j designates the particular transformation being used and 
is referred to as the key. The basic problem of cryptanalysis is to 
determine fj and t from a given x. Shannon has shown [4] that 
cryptanalysis can be successful only if the concealed text contains 
redundancy. 

A fundamental approach to cryptanalysis is to consider the set of 
all possible inverses of the message x, { ff 1 (x) ) . If jc has a large enough 
number of characters (greater than what Shannon calls the unicity 
distance), there will be a most probable text t* in {fj rl (x)}. This 
most probable text t* will have the same statistical structure as the 
natural language. Thus an enciphered text can be analyzed by ex- 
amining all possible inverses ff l {x) for the structure of the natural- 
language text. 

Since the effectiveness of the recognition procedure depends on the 
redundancy of the natural language, the procedure is well suited for 
the above approach. In addition, the performance of the recogni- 
tion procedure can be predicted directly from the theory. 

Let us now apply the recognition procedure to an example of 
cryptanalysis. A cipher using 456,976 possible keys was used to en- 
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cipher a typical segment of English text. This cipher, suggested by 
Shannon ([4], p. 709), produces enciphered texts with nearly uni- 
form single-letter distributions. A computer was programmed to 
test all possible keys on the enciphered text, using the recognition 
procedure. The time per trial of a 72-character message was less 
than 100 seconds. 

Table 3 compares the experimental results with those predicted by 
the theory. Since the maximum number of observations for the 
experiment, 72, was twice the number used in the theoretical analysis, 
the results for the experiment should be slightly better than those of 
the theory; and, indeed, except for the vbn, the experimental results 
are in excellent agreement with the theory. The difference in vsn is 
due to the differences in the maximum number of observations. 





Theoretical (Af =36) 


Experimental (M = 72) 


Probability of 

Correction Detection 


ti 0.754 


0.8 


Probability of 
False Alarm 


<0.0001 


0.00008 


*SN 


27. 


70. 2 




3.3 


3.5 



Table 3. — Comparison of the theoretical and experimental results. 

W F IC = 100 W M /W F = 500 

A Priori Probability of SN = 0.000002 



CONCLUSIONS 

The speed of detection for a deferred-decision process has been 
related to the source redundancy when the N source has uniform dis- 
tribution. This relation has been used to develop a recognition pro- 
cedure for natural-language text. The theoretical performance of 
the recognition procedure has been supported by results obtained in 
the solution of a general problem of cryptanalysis. 



discrepancy between theoretical and experimental results is accounted for 
by the increased M of the experimental work. 
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